Generating Diverse and Accurate Visual Captions by Comparative Adversarial Learning
نویسندگان
چکیده
We study how to generate captions that are not only accurate in describing an image but also discriminative across different images. The problem is both fundamental and interesting, as most machinegenerated captions, despite phenomenal research progresses in the past several years, are expressed in a very monotonic and featureless format. While such captions are normally accurate, they often lack important characteristics in human languages distinctiveness for each caption and diversity for different images. To address this problem, we propose a novel conditional generative adversarial network for generating diverse captions across images. Instead of estimating the quality of a caption solely on one image, the proposed comparative adversarial learning framework better assesses the quality of captions by comparing a set of captions within the image-caption joint space. By contrasting with human-written captions and image-mismatched captions, the caption generator effectively exploits the inherent characteristics of human languages, and generates more discriminative captions. We show that our proposed network is capable of producing accurate and diverse captions across images.
منابع مشابه
Bootstrap, Review, Decode: Using Out-of-Domain Textual Data to Improve Image Captioning
State-of-the-art approaches for image captioning require supervised training data consisting of captions with paired image data. These methods are typically unable to use unsupervised data such as textual data with no corresponding images, which is a much more abundant commodity. We here propose a novel way of using such textual data by artificially generating missing visual information. We eva...
متن کاملText-Guided Attention Model for Image Captioning
Visual attention plays an important role to understand images and demonstrates its effectiveness in generating natural language descriptions of images. On the other hand, recent studies show that language associated with an image can steer visual attention in the scene during our cognitive process. Inspired by this, we introduce a text-guided attention model for image captioning, which learns t...
متن کاملThe Comparative Effect of Visual vs. Auditory Input Enhancement on Learning Non-Congruent Phrasal Verbs by Iranian EFL Learners
Vocabulary is one of the essential components of language and learning phrasal verbs as part of vocabulary is quite challenging for foreign language learners. The present study aimed at investigating the effects of visual and auditory input enhancement on learning non-congruent phrasal verbs. The participants of the study were 90 intermediate English language learners who were divided into two ...
متن کاملThe effects of captioning texts and caption ordering on L2 listening comprehension and vocabulary learning
This study investigated the effects of captioned texts on second/foreign (L2) listening comprehension and vocabulary gains using a computer multimedia program. Additionally, it explored the caption ordering effect (i.e. captions displayed during the first or second listening), and the interaction of captioning order with the L2 proficiency level of language learners in listening comprehension a...
متن کاملAdversarial-Playground: A Visualization Suite for Adversarial Sample Generation
With growing interest in adversarial machine learning, it is important for practitioners and users of machine learning to understand how their models may be attacked. We present a web-based visualization tool, ADVERSARIALPLAYGROUND, to demonstrate the efficacy of common adversarial methods against a convolutional neural network. ADVERSARIAL-PLAYGROUND provides users an efficient and effective e...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2018